Project Description¶
Lego is a well-known brand worldwide, famous for its wide range of toys, popular movies, and successful video games. In this project, we will explore an important moment in Lego’s history: the introduction of licensed sets, such as Star Wars, Super Heroes, and Harry Potter.
The launch of Lego’s first licensed series, Star Wars, was very successful and led to many more collaborations with other popular themes. We will imagine that the partnerships team has asked us to analyze this success. Before we begin the analysis, we will review the descriptions of the two datasets we will use, shown below.
The Data¶
lego_sets.csv¶
Column | Description |
---|---|
set_num |
A unique code for each Lego set. This is very important because a missing value means the set is a duplicate or invalid. |
name |
The name of the Lego set. |
year |
The year the set was released. |
num_parts |
The number of pieces in the set. This is not critical for our analysis, so missing values are okay. |
theme_name |
The name of the sub-theme the set belongs to. |
parent_theme |
The name of the main theme the set belongs to. This matches the name column in the parent_themes dataset. |
parent_themes.csv¶
Column | Description |
---|---|
id |
A unique code for each parent theme. |
name |
The name of the parent theme. |
is_licensed |
A Boolean value showing if the theme is licensed or not. |
The Rebrickable dataset contains information about every Lego set ever sold, including set names and the bricks they contain. Although Lego bricks are small, this is a large and rich dataset. In this project, we will use this data along with the pandas library to explore the history of Lego’s licensed sets. We will also calculate what percentage of all licensed sets are themed around Star Wars.
This project will help us understand how licensed partnerships shaped Lego’s product line an contributed to its success.
What percentage of all licensed sets ever released were Star Wars themed?¶
# Import pandas and read in the DataFrame
import pandas as pd
lego_sets = pd.read_csv('lego_sets.csv')
lego_sets.head()
set_num | name | year | num_parts | theme_name | parent_theme | |
---|---|---|---|---|---|---|
0 | 00-1 | Weetabix Castle | 1970 | 471.0 | Castle | Legoland |
1 | 0011-2 | Town Mini-Figures | 1978 | NaN | Supplemental | Town |
2 | 0011-3 | Castle 2 for 1 Bonus Offer | 1987 | NaN | Lion Knights | Castle |
3 | 0012-1 | Space Mini-Figures | 1979 | 12.0 | Supplemental | Space |
4 | 0013-1 | Space Mini-Figures | 1979 | 12.0 | Supplemental | Space |
# Drop relevant missing rows
lego_sets_clean = lego_sets.dropna(subset=['set_num', 'name', 'theme_name'])
lego_sets_clean.head()
set_num | name | year | num_parts | theme_name | parent_theme | |
---|---|---|---|---|---|---|
0 | 00-1 | Weetabix Castle | 1970 | 471.0 | Castle | Legoland |
1 | 0011-2 | Town Mini-Figures | 1978 | NaN | Supplemental | Town |
2 | 0011-3 | Castle 2 for 1 Bonus Offer | 1987 | NaN | Lion Knights | Castle |
3 | 0012-1 | Space Mini-Figures | 1979 | 12.0 | Supplemental | Space |
4 | 0013-1 | Space Mini-Figures | 1979 | 12.0 | Supplemental | Space |
# Get list of licensed sets
parent_themes = pd.read_csv('parent_themes.csv')
licensed_themes = parent_themes[parent_themes['is_licensed']]['name']
licensed_themes.head()
7 Star Wars 12 Harry Potter 16 Pirates of the Caribbean 17 Indiana Jones 18 Cars Name: name, dtype: object
# Subset for licensed sets
licensed = lego_sets_clean['parent_theme'].isin(licensed_themes)
licensed_sets = lego_sets_clean[licensed]
licensed_sets.head()
set_num | name | year | num_parts | theme_name | parent_theme | |
---|---|---|---|---|---|---|
44 | 10018-1 | Darth Maul | 2001 | 1868.0 | Star Wars | Star Wars |
45 | 10019-1 | Rebel Blockade Runner - UCS | 2001 | NaN | Star Wars Episode 4/5/6 | Star Wars |
54 | 10026-1 | Naboo Starfighter - UCS | 2002 | NaN | Star Wars Episode 1 | Star Wars |
57 | 10030-1 | Imperial Star Destroyer - UCS | 2002 | 3115.0 | Star Wars Episode 4/5/6 | Star Wars |
95 | 10075-1 | Spider-Man Action Pack | 2002 | 25.0 | Spider-Man | Super Heroes |
# Calculate the percentage of licensed sets that are Star Wars themed
all_sets = len(licensed_sets)
star_wars_sets = len(licensed_sets[licensed_sets['parent_theme'] == 'Star Wars'])
ratio = star_wars_sets / all_sets
the_force = int(ratio * 100)
print(f'The percentage of licensed sets that are Star Wars themed is {the_force}%.')
The percentage of licensed sets that are Star Wars themed is 51%.
In which year was the highest number of Star Wars sets released?¶
# Create a pivot table of sets released by theme per year
licensed_pivot = licensed_sets.pivot_table(index='year', columns='parent_theme', values='set_num', aggfunc='count')
# Find the year when the most Star Wars sets were released
licensed_pivot.sort_values(by="Star Wars", ascending=False)["Star Wars"]
new_era = 2016
print(f'The year when the most Star Wars sets were released was {new_era}.')
The year when the most Star Wars sets were released was 2016.